Big Data Regression Using Tree Based Segmentation
نویسندگان
چکیده
Scaling regression to large datasets is a common problem in many application areas. We propose a two step approach to scaling regression to large datasets. Using a regression tree (CART) to segment the large dataset constitutes the first step of this approach. The second step of this approach is to develop a suitable regression model for each segment. Since segment sizes are not very large, we have the ability to apply sophisticated regression techniques if required. A nice feature of this two step approach is that it can yield models that have good explanatory power as well as good predictive performance. Ensemble methods like Gradient Boosted Trees can offer excellent predictive performance but may not provide interpretable models. In the experiments reported in this study, we found that the predictive performance of the proposed approach matched the predictive performance of Gradient Boosted Trees.
منابع مشابه
Object-Based Classification of UltraCamD Imagery for Identification of Tree Species in the Mixed Planted Forest
This study is a contribution to assess the high resolution digital aerial imagery for semi-automatic analysis of tree species identification. To maximize the benefit of such data, the object-based classification was conducted in a mixed forest plantation. Two subsets of an UltraCam D image were geometrically corrected using aero-triangulation method. Some appropriate transformations were perfor...
متن کاملClassifying the Customers of Telecommunication Company in order to Identify Profitable Customers Based on Their First Transaction, Using Decision Tree: A Case Study of System 780
Effective knowledge and awareness of customers require the market segmentation, through which the customers who have the same needs and purchasing patterns as well as the same response to marketing plans are identified. The selection of a proper variable is a requirement, among other, for a successful market segmentation. In today' world, on one hand, the consumers are bombarded with new goods ...
متن کاملA Study to Improve the Response in Email Campaigning by Comparing Data Mining Segmentation Approaches in Aditi Technologies
Email marketing is increasingly recognized as an effective Internet marketing tool. In this study, a questionnaire is constructed and distributed to a sample of 146 prospects of Aditi Technologies to find the factors associated with higher response rates. The collected data is analyzed using Factor Analysis and the 11 factors, From Line, Subject Line, Personalization of the subject line, Timing...
متن کاملIntrathoracic Airway Tree Segmentation from CT Images Using a Fuzzy Connectivity Method
Introduction: Virtual bronchoscopy is a reliable and efficient diagnostic method for primary symptoms of lung cancer. The segmentation of airways from CT images is a critical step for numerous virtual bronchoscopy applications. Materials and Methods: To overcome the limitations of the fuzzy connectedness method, the proposed technique, called fuzzy connectivity - fuzzy C-mean (FC-FCM), utilized...
متن کاملTackling Simpson's Paradox in Big Data using Classification & Regression Trees
This work is aimed at finding potential Simpson’s paradoxes in Big Data. Simpson’s paradox (SP) arises when choosing the level of data aggregation for causal inference. It describes the phenomenon where the direction of a cause on an effect is reversed when examining the aggregate vs. disaggregates of a sample or population. The practical decision making dilemma that SP raises is which level of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1707.07409 شماره
صفحات -
تاریخ انتشار 2017